NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

How to track the economic impact of public investments in AI

https://doi.org/10.1038/d41586-024-01721-1

Lane, Julia; Owen-Smith, Jason; Weinberg, Bruce A (June 2024, Nature)

Government spending on artificial intelligence (AI) has surged across the world. Quantifying the return on research investments is notoriously difficult, especially in newly emerging economic sectors. Here, we propose a novel way to describe and analyze where AI ideas are being used and how they spread—by tracing the people and academic communities involved in AI research as they transition from government-funded research labs to private sector companies, carrying cutting-edge “AI know-how” with them. Linking existing university administrative data with state employment records allows several quantifiable inferences about the value of AI research to be drawn from these academia-to-industry migrations. Here we describe a pilot implementation of this system, which is being developed in the State of Ohio. It offers a template for governments and policy makers all over the world. Importantly, the metrics discussed below offer a way to measure the economic impact of scientific research in general, with implications for critical and emerging technologies that go far beyond AI.
more » « less
Full Text Available
A Linked Data Mosaic for Policy-Relevant Research on Science and Innovation: Value, Transparency, Rigor, and Community

https://doi.org/10.1162/99608f92.1e23fb3f

Chang, Wan-Ying; Garner, Maryah; Basner, Jodi; Weinberg, Bruce; Owen-Smith, Jason (April 2022, Harvard data science review)

This article presents a new framework for realizing the value of linked data understood as a strategic asset and increasingly necessary form of infrastructure for policy-making and research in many domains. We outline a framework, the ‘data mosaic’ approach, which combines socio-organizational and technical aspects. After demonstrating the value of linked data, we highlight key concepts and dangers for community-developed data infrastructures. We concretize the framework in the context of work on science and innovation generally. Next we consider how a new partnership to link federal survey data, university data, and a range of public and proprietary data represents a concrete step toward building and sustaining a valuable data mosaic. We discuss technical issues surrounding linked data but emphasize that linking data involves addressing the varied concerns of wide-ranging data holders, including privacy, confidentiality, and security, as well as ensuring that all parties receive value from participating. The core of successful data mosaic projects, we contend, is as much institutional and organizational as it is technical. As such, sustained efforts to fully engage and develop diverse, innovative communities are essential.
more » « less
Full Text Available
ORCID-linked labeled data for evaluating author name disambiguation at scale

https://doi.org/10.1007/s11192-020-03826-6

Kim, Jinseok; Owen-Smith, Jason (March 2021, Scientometrics)
null (Ed.)
Abstract How can we evaluate the performance of a disambiguation method implemented on big bibliographic data? This study suggests that the open researcher profile system, ORCID, can be used as an authority source to label name instances at scale. This study demonstrates the potential by evaluating the disambiguation performances of Author-ity2009 (which algorithmically disambiguates author names in MEDLINE) using 3 million name instances that are automatically labeled through linkage to 5 million ORCID researcher profiles. Results show that although ORCID-linked labeled data do not effectively represent the population of name instances in Author-ity2009, they do effectively capture the ‘high precision over high recall’ performances of Author-ity2009. In addition, ORCID-linked labeled data can provide nuanced details about the Author-ity2009’s performance when name instances are evaluated within and across ethnicity categories. As ORCID continues to be expanded to include more researchers, labeled data via ORCID-linkage can be improved in representing the population of a whole disambiguated data and updated on a regular basis. This can benefit author name disambiguation researchers and practitioners who need large-scale labeled data but lack resources for manual labeling or access to other authority sources for linkage-based labeling. The ORCID-linked labeled data for Author-ity2009 are publicly available for validation and reuse.
more » « less
Full Text Available
Model Reuse in Machine Learning for Author Name Disambiguation: An Exploration of Transfer Learning

https://doi.org/10.1109/ACCESS.2020.3031112

Kim, Jinseok; Owen-Smith, Jason (January 2020, IEEE Access)
Generating automatically labeled data for author name disambiguation: an iterative clustering method

https://doi.org/10.1007/s11192-018-2968-3

Kim, Jinseok; Kim, Jinmo; Owen-Smith, Jason (January 2019, Scientometrics)

Full Text Available
Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning

https://doi.org/10.1002/asi.24459

Kim, Jinseok; Kim, Jenna; Owen‐Smith, Jason (February 2021, Journal of the Association for Information Science and Technology)

Abstract In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the potential of ethnic name partitioning by comparing performance of four machine learning algorithms trained and tested on the entire data or specifically on individual name groups. Results show that ethnicity‐based name partitioning can substantially improve disambiguation performance because the individual models are better suited for their respective name group. The improvements occur across all ethnic name groups with different magnitudes. Performance gains in predicting matched name pairs outweigh losses in predicting nonmatched pairs. Feature (e.g., coauthor name) similarities of name pairs vary across ethnic name groups. Such differences may enable the development of ethnicity‐specific feature weights to improve prediction for specific ethic name categories. These findings are observed for three labeled data with a natural distribution of problem sizes as well as one in which all ethnic name groups are controlled for the same sizes of ambiguous names. This study is expected to motive scholars to group author names based on ethnicity prior to disambiguation.
more » « less

Search for: All records